Tree Seedling Survival Responses Data

Source

Original Dataset

Link to original authors: https://doi.org/10.5061/dryad.xd2547dpw

Citation: Wood, K., Kobe, R., Ibáñez, I., et al. (2023). Tree seedling functional traits mediate plant-soil feedback survival responses across a gradient of light availability. Dryad. https://doi.org/10.5061/dryad.xd2547dpw

Study Funded by: National Science Foundation: NSF DEB 145732, Michigan State University, and Alma College

About the data

Introduction to Data:

The purpose of the study was to see how functional traits mediate plant-soil feedbacks (PSFs) via seedling survival.

Methods:

This study was a factorial blocked design field experiment. 4 tree species (Acer saccharum, Prunus serotina, Quercus alba, and Quercus rubra), 7 soil sources (Acer saccharum, Prunus serotina, Quercus alba, Quercus rubra, Acer rebrum, Populus grandidentata, and Sterilized) and a variety of forest understory light levels were used. There were a total of 3,024 seedlings in this experiment. Survival was measured twice a week and at 3 weeks, some seedlings were randomly selected to measure mycorrhizal colonization, phenolics, lignin, and NSC.

Column information:

  • No: Seedling ID number
  • Plot: Plot the seedling was planted in (1-18)
  • Subplot: Subplot seedling was planted in (A-E, one per corner and one in the middle)
  • Species: Species of seedling (Acer saccharum, Prunus serotina, Quercus alba, and Quercus rubra)
  • Light_ISF: Light level based on HemiView software.
  • Light_Cat: Light levels were categorized into ‘Low’ ‘Med’ or ‘High’
  • Core: Year the soil core was taken, either 2016 or 2017
  • Soil: Which species the soil core was taken from. Includes all 4 species as well as Acer rubrum, Populus grandidentata and a sterilized conspecific of each species
  • Adult: Indicates which adult tree the soil core came from, up to 6 adults per species.
  • Sterile: Indicates whether the soil was steralized or not
  • Conspecific: Indicates if the soil core used was from it’s own species (conspecific), a different species (heterospecific) or steralized (sterile)
  • Myco: Mycorrhizal type of the seedling (AMF or EMF)
  • SoilMyco: Mycorrhizal type of the soil (AMF, EMF or Sterile)
  • PlantDate: Date each tree was planted
  • AMF: Measures the percent arbuscular mycorrhizal fungi colonization on fine roots of harvested seedlings
  • EMF: Measures the percent ectomycorrhizal fungi colonization on the root tips of harvested seedlings
  • Phenolics: Calculated as nmol Gallic acid equivalents per mg dry extract
  • NSC: Calculated as the percent dry mass non-structural carbohydrates
  • Lignin: Measured as percent dry mass lignin
  • Census: The census number when each seedling died or was harvested
  • Time: The number of days before the seedling died or was harvested
  • Event: Indicates whether the seedling died during the study (1 = dead, 0 = harvested or alive at end of study)
  • Harvest: Indicates whether the seedling was harvested during the study
  • Alive: Indicates whether the seedling was alive at the end of the study

Focus for Data Analysis

The focus for this data analysis will be on how different environmental factors affect the amount of phenolics, lignin, and NSC (non-structural carbohydrates) in seedlings of different tree species. The main factors that will be analyzed are Light_Cat, Soil, Conspecific, SoilMyco, AMF, and EMF.

Original Data

Output

Preview of Original Data

## Rows: 2,783
## Columns: 24
## $ No          <int> 126, 11, 12, 2823, 5679, 18, 25, 40, 26, 41, 30, 33, 34, 3…
## $ Plot        <int> 1, 1, 1, 7, 14, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Subplot     <chr> "C", "C", "C", "D", "A", "C", "A", "A", "A", "A", "C", "C"…
## $ Species     <chr> "Acer saccharum", "Quercus alba", "Quercus rubra", "Acer s…
## $ Light_ISF   <dbl> 0.106, 0.106, 0.106, 0.080, 0.060, 0.106, 0.108, 0.108, 0.…
## $ Light_Cat   <chr> "Med", "Med", "Med", "Med", "Low", "Med", "Med", "Med", "M…
## $ Core        <int> 2017, 2017, 2017, 2016, 2017, 2016, 2016, 2017, 2016, 2016…
## $ Soil        <chr> "Prunus serotina", "Quercus rubra", "Prunus serotina", "Pr…
## $ Adult       <chr> "I", "970", "J", "J", "689", "1332", "891", "1595", "1323"…
## $ Sterile     <chr> "Non-Sterile", "Non-Sterile", "Non-Sterile", "Non-Sterile"…
## $ Conspecific <chr> "Heterospecific", "Heterospecific", "Heterospecific", "Het…
## $ Myco        <chr> "AMF", "EMF", "EMF", "AMF", "AMF", "AMF", "EMF", "EMF", "E…
## $ SoilMyco    <chr> "AMF", "EMF", "AMF", "AMF", "AMF", "AMF", "EMF", "Sterile"…
## $ PlantDate   <chr> "6/11/18", "5/25/18", "5/31/18", "6/11/18", "6/11/18", "6/…
## $ AMF         <dbl> 22.00, 15.82, 24.45, 22.23, 21.15, 35.29, 24.00, 4.00, 28.…
## $ EMF         <dbl> NA, 31.07, 28.19, NA, NA, NA, 20.00, 0.00, 36.18, NA, 28.1…
## $ Phenolics   <dbl> -0.56, 5.19, 3.36, -0.71, -0.58, 0.30, 5.11, 3.43, 3.83, -…
## $ Lignin      <dbl> 13.86, 20.52, 24.74, 14.29, 10.85, 10.80, 18.82, 25.22, 26…
## $ NSC         <dbl> 12.15, 19.29, 15.01, 12.36, 11.20, 13.79, 22.51, 14.81, 14…
## $ Census      <int> 4, 33, 18, 4, 4, 7, 7, 7, 33, 7, 33, 21, 21, 17, 7, 33, 7,…
## $ Time        <dbl> 14.0, 115.5, 63.0, 14.0, 14.0, 24.5, 24.5, 24.5, 115.5, 24…
## $ Event       <int> 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1…
## $ Harvest     <chr> NA, NA, NA, NA, NA, NA, "X", "X", NA, NA, NA, NA, NA, NA, …
## $ Alive       <chr> NA, "X", NA, NA, NA, NA, NA, NA, "X", NA, "X", NA, NA, NA,…

R code

Preview of Original Data

tree_dat <- read.csv('Tree_Data.csv')
glimpse(tree_dat)
## Rows: 2,783
## Columns: 24
## $ No          <int> 126, 11, 12, 2823, 5679, 18, 25, 40, 26, 41, 30, 33, 34, 3…
## $ Plot        <int> 1, 1, 1, 7, 14, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Subplot     <chr> "C", "C", "C", "D", "A", "C", "A", "A", "A", "A", "C", "C"…
## $ Species     <chr> "Acer saccharum", "Quercus alba", "Quercus rubra", "Acer s…
## $ Light_ISF   <dbl> 0.106, 0.106, 0.106, 0.080, 0.060, 0.106, 0.108, 0.108, 0.…
## $ Light_Cat   <chr> "Med", "Med", "Med", "Med", "Low", "Med", "Med", "Med", "M…
## $ Core        <int> 2017, 2017, 2017, 2016, 2017, 2016, 2016, 2017, 2016, 2016…
## $ Soil        <chr> "Prunus serotina", "Quercus rubra", "Prunus serotina", "Pr…
## $ Adult       <chr> "I", "970", "J", "J", "689", "1332", "891", "1595", "1323"…
## $ Sterile     <chr> "Non-Sterile", "Non-Sterile", "Non-Sterile", "Non-Sterile"…
## $ Conspecific <chr> "Heterospecific", "Heterospecific", "Heterospecific", "Het…
## $ Myco        <chr> "AMF", "EMF", "EMF", "AMF", "AMF", "AMF", "EMF", "EMF", "E…
## $ SoilMyco    <chr> "AMF", "EMF", "AMF", "AMF", "AMF", "AMF", "EMF", "Sterile"…
## $ PlantDate   <chr> "6/11/18", "5/25/18", "5/31/18", "6/11/18", "6/11/18", "6/…
## $ AMF         <dbl> 22.00, 15.82, 24.45, 22.23, 21.15, 35.29, 24.00, 4.00, 28.…
## $ EMF         <dbl> NA, 31.07, 28.19, NA, NA, NA, 20.00, 0.00, 36.18, NA, 28.1…
## $ Phenolics   <dbl> -0.56, 5.19, 3.36, -0.71, -0.58, 0.30, 5.11, 3.43, 3.83, -…
## $ Lignin      <dbl> 13.86, 20.52, 24.74, 14.29, 10.85, 10.80, 18.82, 25.22, 26…
## $ NSC         <dbl> 12.15, 19.29, 15.01, 12.36, 11.20, 13.79, 22.51, 14.81, 14…
## $ Census      <int> 4, 33, 18, 4, 4, 7, 7, 7, 33, 7, 33, 21, 21, 17, 7, 33, 7,…
## $ Time        <dbl> 14.0, 115.5, 63.0, 14.0, 14.0, 24.5, 24.5, 24.5, 115.5, 24…
## $ Event       <int> 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1…
## $ Harvest     <chr> NA, NA, NA, NA, NA, NA, "X", "X", NA, NA, NA, NA, NA, NA, …
## $ Alive       <chr> NA, "X", NA, NA, NA, NA, NA, NA, "X", NA, "X", NA, NA, NA,…

Cleaned Data

Cleaned Data

Output

The cleaned data has less columns and is better for plotting/modeling.

  • Columns removed: Light_ISF, Core, Sterile, Adult, PlantDate, Census

  • Columns ‘Event’, ‘Harvest’, and ‘Alive’ merged into one column called ‘Event’ that indicates whether a tree died, was harvested, or was alive at the end of the study.

## Rows: 2,782
## Columns: 16
## $ No          <int> 126, 11, 12, 2823, 5679, 18, 25, 40, 26, 41, 30, 33, 34, 3…
## $ Plot        <int> 1, 1, 1, 7, 14, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Subplot     <chr> "C", "C", "C", "D", "A", "C", "A", "A", "A", "A", "C", "C"…
## $ Species     <chr> "Acer saccharum", "Quercus alba", "Quercus rubra", "Acer s…
## $ Light_Cat   <chr> "Med", "Med", "Med", "Med", "Low", "Med", "Med", "Med", "M…
## $ Soil        <chr> "Prunus serotina", "Quercus rubra", "Prunus serotina", "Pr…
## $ Conspecific <chr> "Heterospecific", "Heterospecific", "Heterospecific", "Het…
## $ Myco        <chr> "AMF", "EMF", "EMF", "AMF", "AMF", "AMF", "EMF", "EMF", "E…
## $ SoilMyco    <chr> "AMF", "EMF", "AMF", "AMF", "AMF", "AMF", "EMF", "Sterile"…
## $ AMF         <dbl> 22.00, 15.82, 24.45, 22.23, 21.15, 35.29, 24.00, 4.00, 28.…
## $ EMF         <dbl> NA, 31.07, 28.19, NA, NA, NA, 20.00, 0.00, 36.18, NA, 28.1…
## $ Phenolics   <dbl> -0.56, 5.19, 3.36, -0.71, -0.58, 0.30, 5.11, 3.43, 3.83, -…
## $ Lignin      <dbl> 13.86, 20.52, 24.74, 14.29, 10.85, 10.80, 18.82, 25.22, 26…
## $ NSC         <dbl> 12.15, 19.29, 15.01, 12.36, 11.20, 13.79, 22.51, 14.81, 14…
## $ Time        <dbl> 14.0, 115.5, 63.0, 14.0, 14.0, 24.5, 24.5, 24.5, 115.5, 24…
## $ Event       <chr> "Dead", "Alive", "Dead", "Dead", "Dead", "Dead", "Harvest"…

R code

The cleaned data has less columns and is better for plotting/modeling.

  • Columns removed: Light_ISF, Core, Sterile, Adult, PlantDate, Census

  • Columns ‘Event’, ‘Harvest’, and ‘Alive’ merged into one column called ‘Event’ that indicates whether a tree died, was harvested, or was alive at the end of the study

tree_dat_clean <- tree_dat %>%
  select(-c(Light_ISF, Core, Sterile, Adult, PlantDate, Census)) %>%
  mutate(Event = case_when(
    Event == '1' ~ 'X',
    Event == '0' ~ NA_character_)) %>%
  pivot_longer(cols = c(Event, Harvest, Alive),
               names_to = 'Event',
               values_to = 'Status') %>%
  mutate(Event = case_when(
    Event == 'Event' ~ 'Dead',
    TRUE ~ Event  )) %>%
  drop_na(Status) %>%
  select(-Status)

glimpse(tree_dat_clean)
## Rows: 2,782
## Columns: 16
## $ No          <int> 126, 11, 12, 2823, 5679, 18, 25, 40, 26, 41, 30, 33, 34, 3…
## $ Plot        <int> 1, 1, 1, 7, 14, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Subplot     <chr> "C", "C", "C", "D", "A", "C", "A", "A", "A", "A", "C", "C"…
## $ Species     <chr> "Acer saccharum", "Quercus alba", "Quercus rubra", "Acer s…
## $ Light_Cat   <chr> "Med", "Med", "Med", "Med", "Low", "Med", "Med", "Med", "M…
## $ Soil        <chr> "Prunus serotina", "Quercus rubra", "Prunus serotina", "Pr…
## $ Conspecific <chr> "Heterospecific", "Heterospecific", "Heterospecific", "Het…
## $ Myco        <chr> "AMF", "EMF", "EMF", "AMF", "AMF", "AMF", "EMF", "EMF", "E…
## $ SoilMyco    <chr> "AMF", "EMF", "AMF", "AMF", "AMF", "AMF", "EMF", "Sterile"…
## $ AMF         <dbl> 22.00, 15.82, 24.45, 22.23, 21.15, 35.29, 24.00, 4.00, 28.…
## $ EMF         <dbl> NA, 31.07, 28.19, NA, NA, NA, 20.00, 0.00, 36.18, NA, 28.1…
## $ Phenolics   <dbl> -0.56, 5.19, 3.36, -0.71, -0.58, 0.30, 5.11, 3.43, 3.83, -…
## $ Lignin      <dbl> 13.86, 20.52, 24.74, 14.29, 10.85, 10.80, 18.82, 25.22, 26…
## $ NSC         <dbl> 12.15, 19.29, 15.01, 12.36, 11.20, 13.79, 22.51, 14.81, 14…
## $ Time        <dbl> 14.0, 115.5, 63.0, 14.0, 14.0, 24.5, 24.5, 24.5, 115.5, 24…
## $ Event       <chr> "Dead", "Alive", "Dead", "Dead", "Dead", "Dead", "Harvest"…

Plotting Data

NSC against Lignin with Light_Cat as color

Output

In Acer saccharum and Prunus serotina, NSC and Lignin have a positive relationship. High and Med light levels seem to result in higher levels of both NSC and Lignin compared to Low light levels.

In Quercus alba and Quercus rubra, NSC and Lignin have a negative relationship. While NSC and Lignin levels are comparable across light levels in Q. alba, it seems that low light levels in Q. rubra may be associated with lower levels of NSC.

R code

In Acer saccharum and Prunus serotina, NSC and Lignin have a positive relationship. High and Med light levels seem to result in higher levels of both NSC and Lignin compared to Low light levels.

In Quercus alba and Quercus rubra, NSC and Lignin have a negative relationship. While NSC and Lignin levels are comparable across light levels in Q. alba, it seems that low light levels in Q. rubramay be associated with lower levels of NSC.

tree_dat_clean %>%
  ggplot(aes(x = NSC,
             y = Lignin,
             color = Light_Cat)) +
  geom_point() +
  facet_wrap(~Species, scales = 'free') + 
  theme_minimal() +
  ggtitle('NSC v Lignin')

Phenolics against AMF with Light_Cat as color

Output

Percent dry mass Phenolics and percent AMF have a positive relationship in Prunus serotina.

There also seems to be a trend of higher light categories having a higher percentage of Phenolics.

R code

Percent dry mass Phenolics and percent AMF have a positive relationship in Prunus serotina.

There also seems to be a trend of higher light categories having a higher percentage of Phenolics.

tree_dat_clean %>%
  ggplot(aes(x = AMF,
             y = Phenolics,
             color = Light_Cat)) +
  geom_point() +
  facet_wrap(~Species, scales = 'free') +
  theme_minimal() +
  ggtitle('AMF v Phenolics')

Phenolics against SoilMyco

Output

Q. alba seems to have lower Phenolic levels when in sterilized soil compared to soils with AMF or EMF. P. serotina also seems to have lower levels of Phenolics in sterilized soils.

R code

Q. alba seems to have lower Phenolic levels when in sterilized soil compared to soils with AMF or EMF. P. serotina also seems to have lower levels of Phenolics in sterilized soil.

tree_dat_clean %>%
  ggplot(aes(x = SoilMyco,
             y = Phenolics,
             color = SoilMyco)) +
  geom_boxplot() +
  facet_wrap(~Species, scales = 'free') +
  theme_minimal() +
  ggtitle('Phenolics v SoilMyco')

Light_Cat against Phenolics, Lignin, and NSC

Output

A. saccharum, P. serotina and Q. alba seem to have higher levels of Phenolics with more higher light. Q. rubra only seems to have higher Phenolic levels in the High light category and the difference is not as dramatic as it is in the other 3 species.

P. serotina seems to have higher levels of both Lignin and NSC when in High and Med light categories. Relationships between Light_Cat and Lignin and Light_Cat and NSC are more difficult to see in the other 3 species.

R code

A. saccharum, P. serotina and Q. alba seem to have higher levels of Phenolics with more higher light. Q. rubra only seems to have higher Phenolic levels in the High light category and the difference is not as dramatic as it is in the other 3 species.

P. serotina seems to have higher levels of both Lignin and NSC when in High and Med light categories. Relationships between Light_Cat and Lignin and Light_Cat and NSC are more difficult to see in the other 3 species.

tree_dat_clean %>%
  ggplot(aes(x = Light_Cat,
             y = Phenolics,
             color = Light_Cat)) +
  geom_jitter() +
  facet_wrap(~Species, scales = 'free') +
  theme_minimal() +
    ggtitle('Phenolics v Light_Cat')

tree_dat_clean %>%
  ggplot(aes(x = Light_Cat,
             y = Lignin,
             color = Light_Cat)) +
  geom_jitter() +
  facet_wrap(~Species, scales = 'free') +
    theme_minimal() +
    ggtitle('Lignin v Light_Cat')

tree_dat_clean %>%
  ggplot(aes(x = Light_Cat,
             y = NSC,
             color = Light_Cat)) +
  geom_jitter() +
  facet_wrap(~Species, scales = 'free') +
    theme_minimal() +
    ggtitle('NSC v Light_Cat')

Statistical Methods

idk if we’re gonna do this or not

Modeling Data

Choosing a Model

Output

I created 3 models, one to predict Phenolics, one to predict Lignin, and one to predict NSC. Through testing various models, I came to the conclusion that the formula for the best model was to predict the dependent variable (Phenolics, Lignin, or NSC) based on the interaction of Conspecific, Species, SoilMyco, and Light_Cat. (check r code for more info on models)

These are represented as mod1 for Phenolics, mod4 for Lignin, and mod7 for NSC.

The model for Phenolics was the most accurate out of the three as it had the highest r-squared and a relatively low AIC. The models for Lignin and NSC had pretty good r-squared values but also had very high AIC values, meaning they likely won’t be as good at predicting values.

## [1] "Model for Phenolics"
## # Comparison of Model Performance Indices
## 
## Name | Model |  AIC (weights) | AICc (weights) |  BIC (weights) |    R2 |  RMSE | Sigma
## ---------------------------------------------------------------------------------------
## mod1 |   glm | 1258.7 (>.999) | 1260.5 (>.999) | 1549.3 (>.999) | 0.977 | 0.298 | 0.301
## mod2 |   glm | 1636.1 (<.001) | 1640.1 (<.001) | 2069.1 (<.001) | 0.974 | 0.316 | 0.320
## mod3 |   glm | 1842.7 (<.001) | 1843.7 (<.001) | 2062.1 (<.001) | 0.971 | 0.333 | 0.335
## [1] "Model for Lignin"
## # Comparison of Model Performance Indices
## 
## Name | Model |   AIC (weights) |  AICc (weights) |   BIC (weights) |    R2 |  RMSE | Sigma
## ------------------------------------------------------------------------------------------
## mod4 |   glm | 11205.2 (>.999) | 11207.0 (>.999) | 11495.8 (>.999) | 0.931 | 1.781 | 1.797
## mod5 |   glm | 11448.6 (<.001) | 11449.6 (<.001) | 11668.0 (<.001) | 0.924 | 1.869 | 1.881
## mod6 |   glm | 11536.9 (<.001) | 11537.9 (<.001) | 11756.3 (<.001) | 0.922 | 1.899 | 1.911
## [1] "Model for NSC"
## # Comparison of Model Performance Indices
## 
## Name  | Model |   AIC (weights) |  AICc (weights) |   BIC (weights) |    R2 |  RMSE | Sigma
## -------------------------------------------------------------------------------------------
## mod7  |   glm | 11555.8 (>.999) | 11557.6 (>.999) | 11846.5 (>.999) | 0.805 | 1.897 | 1.914
## mod8  |   glm | 11829.2 (<.001) | 11830.2 (<.001) | 12048.6 (<.001) | 0.783 | 2.001 | 2.014
## mod9  |   glm | 11870.3 (<.001) | 11871.3 (<.001) | 12089.8 (<.001) | 0.780 | 2.016 | 2.029
## mod10 |   glm | 12247.4 (<.001) | 12247.5 (<.001) | 12300.8 (<.001) | 0.743 | 2.179 | 2.182

R code

I created 3 models, one to predict Phenolics, one to predict Lignin, and one to predict NSC. Through testing various models, I came to the conclusion that the formula for the best model was to predict the dependent variable (Phenolics, Lignin, or NSC) based on the interaction of Conspecific, Species, SoilMyco, and Light_Cat. (check r code for more info on models)

These are represented as mod1 for Phenolics, mod4 for Lignin, and mod7 for NSC.

The model for Phenolics was the most accurate out of the three as it had the highest r-squared and a relatively low AIC. The models for Lignin and NSC had pretty good r-squared values but also had very high AIC values, meaning they likely won’t be as good at predicting values.

print('Model for Phenolics')
## [1] "Model for Phenolics"
mod1 <- glm(data = tree_dat_clean, Phenolics ~ Conspecific * Species * SoilMyco * Light_Cat)
mod2 <- glm(data = tree_dat_clean, Phenolics ~ Conspecific * Species * AMF * Light_Cat)
mod3 <- glm(data = tree_dat_clean, Phenolics ~ Conspecific * Species * Light_Cat)

compare_performance(mod1, mod2, mod3)
## # Comparison of Model Performance Indices
## 
## Name | Model |  AIC (weights) | AICc (weights) |  BIC (weights) |    R2 |  RMSE | Sigma
## ---------------------------------------------------------------------------------------
## mod1 |   glm | 1258.7 (>.999) | 1260.5 (>.999) | 1549.3 (>.999) | 0.977 | 0.298 | 0.301
## mod2 |   glm | 1636.1 (<.001) | 1640.1 (<.001) | 2069.1 (<.001) | 0.974 | 0.316 | 0.320
## mod3 |   glm | 1842.7 (<.001) | 1843.7 (<.001) | 2062.1 (<.001) | 0.971 | 0.333 | 0.335
print('Model for Lignin')
## [1] "Model for Lignin"
mod4 <- glm(data = tree_dat_clean, Lignin ~ Conspecific * Species * SoilMyco * Light_Cat)
mod5 <- glm(data = tree_dat_clean, Lignin ~ Species * SoilMyco * Light_Cat)
mod6 <- glm(data = tree_dat_clean, Lignin ~ Conspecific * Species * Light_Cat)

compare_performance(mod4, mod5, mod6)
## # Comparison of Model Performance Indices
## 
## Name | Model |   AIC (weights) |  AICc (weights) |   BIC (weights) |    R2 |  RMSE | Sigma
## ------------------------------------------------------------------------------------------
## mod4 |   glm | 11205.2 (>.999) | 11207.0 (>.999) | 11495.8 (>.999) | 0.931 | 1.781 | 1.797
## mod5 |   glm | 11448.6 (<.001) | 11449.6 (<.001) | 11668.0 (<.001) | 0.924 | 1.869 | 1.881
## mod6 |   glm | 11536.9 (<.001) | 11537.9 (<.001) | 11756.3 (<.001) | 0.922 | 1.899 | 1.911
print('Model for NSC')
## [1] "Model for NSC"
mod7 <- glm(data = tree_dat_clean, NSC ~ Conspecific * Species * SoilMyco * Light_Cat)
mod8 <- glm(data = tree_dat_clean, NSC ~ Species * SoilMyco * Light_Cat)
mod9 <- glm(data = tree_dat_clean, NSC ~ Conspecific * Species * Light_Cat)
mod10 <- glm(data = tree_dat_clean, NSC ~ Conspecific + Species + Light_Cat)

compare_performance(mod7, mod8, mod9, mod10)
## # Comparison of Model Performance Indices
## 
## Name  | Model |   AIC (weights) |  AICc (weights) |   BIC (weights) |    R2 |  RMSE | Sigma
## -------------------------------------------------------------------------------------------
## mod7  |   glm | 11555.8 (>.999) | 11557.6 (>.999) | 11846.5 (>.999) | 0.805 | 1.897 | 1.914
## mod8  |   glm | 11829.2 (<.001) | 11830.2 (<.001) | 12048.6 (<.001) | 0.783 | 2.001 | 2.014
## mod9  |   glm | 11870.3 (<.001) | 11871.3 (<.001) | 12089.8 (<.001) | 0.780 | 2.016 | 2.029
## mod10 |   glm | 12247.4 (<.001) | 12247.5 (<.001) | 12300.8 (<.001) | 0.743 | 2.179 | 2.182

Phenolics Predictive Modeling

Output

For predictive modeling of Phenolics in tree seedlings, I first created a dataframe from my clean data (tree_dat_clean) that included predictions based on my chosen model. A preview of these predictions can be seen below. While not perfect, the predictions do match up relatively well.

I then created a new dataframe with 4 columns: Conspecific, Species, SoilMyco and Light_Cat

## # A tibble: 2,782 × 2
##    Phenolics    pred
##        <dbl>   <dbl>
##  1     -0.56 -0.348 
##  2      5.19  5.03  
##  3      3.36  3.46  
##  4     -0.71 -0.348 
##  5     -0.58 -0.481 
##  6      0.3   0.563 
##  7      5.11  5.03  
##  8      3.43  3.39  
##  9      3.83  3.46  
## 10     -0.05  0.0244
## # ℹ 2,772 more rows
## Joining with `by = join_by(Species, Light_Cat, Conspecific, SoilMyco,
## PredictionType)`
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

R code

Preview of Original Data

df_phenolics <- tree_dat_clean %>% 
  add_predictions(mod1) 
df_phenolics %>% dplyr::select("Phenolics","pred")
## # A tibble: 2,782 × 2
##    Phenolics    pred
##        <dbl>   <dbl>
##  1     -0.56 -0.348 
##  2      5.19  5.03  
##  3      3.36  3.46  
##  4     -0.71 -0.348 
##  5     -0.58 -0.481 
##  6      0.3   0.563 
##  7      5.11  5.03  
##  8      3.43  3.39  
##  9      3.83  3.46  
## 10     -0.05  0.0244
## # ℹ 2,772 more rows
newdf = data.frame(Conspecific = c('Conspecific', 'Heterospecific', 'Sterilized', NA), 
                   Species = c('Acer saccharum', 'Prunus serotina', 'Quercus alba', 'Quercus rubra'), 
                   SoilMyco = c('AMF', 'EMF', 'Sterile', NA), 
                   Light_Cat = c('Low', 'Med', 'High', NA))

pred_phenolics = predict(mod1, newdata = newdf)

hyp_preds_phenolics <- data.frame(Conspecific = newdf$Conspecific, 
                         Species = newdf$Species,
                        SoilMyco = newdf$SoilMyco,
                        Light_Cat = newdf$Light_Cat,
                        pred_phenolics = pred_phenolics)

df_phenolics$PredictionType <- "Real"
hyp_preds_phenolics$PredictionType <- "Hypothetical"

fullpreds_phenolics <- full_join(df_phenolics,hyp_preds_phenolics)
## Joining with `by = join_by(Species, Light_Cat, Conspecific, SoilMyco,
## PredictionType)`
ggplot(fullpreds_phenolics,aes(x=Light_Cat,y=pred, shape  = Conspecific)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Phenolics, color = Conspecific), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_phenolics,aes(x=Light_Cat,y=pred, shape  = SoilMyco)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Phenolics, color = SoilMyco), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_phenolics,aes(x=SoilMyco,y=pred, shape  = Light_Cat)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Phenolics, color = Light_Cat), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_phenolics,aes(x=Conspecific,y=pred, shape  = Light_Cat)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Phenolics, color = Light_Cat), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

Lignin Modeling

Output

Preview of Original Data

## # A tibble: 2,782 × 2
##    Lignin  pred
##     <dbl> <dbl>
##  1   13.9 13.6 
##  2   20.5 20.1 
##  3   24.7 25.5 
##  4   14.3 13.6 
##  5   10.8 11.3 
##  6   10.8  9.37
##  7   18.8 20.1 
##  8   25.2 24.5 
##  9   26.6 25.5 
## 10   13.3 13.6 
## # ℹ 2,772 more rows
## Joining with `by = join_by(Species, Light_Cat, Conspecific, SoilMyco,
## PredictionType)`
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

R code

Preview of Original Data

df_lignin <- tree_dat_clean %>% 
  add_predictions(mod4) 
df_lignin %>% dplyr::select("Lignin","pred")
## # A tibble: 2,782 × 2
##    Lignin  pred
##     <dbl> <dbl>
##  1   13.9 13.6 
##  2   20.5 20.1 
##  3   24.7 25.5 
##  4   14.3 13.6 
##  5   10.8 11.3 
##  6   10.8  9.37
##  7   18.8 20.1 
##  8   25.2 24.5 
##  9   26.6 25.5 
## 10   13.3 13.6 
## # ℹ 2,772 more rows
pred_lignin = predict(mod4, newdata = newdf)

hyp_preds_lignin <- data.frame(Conspecific = newdf$Conspecific, 
                                  Species = newdf$Species,
                                  SoilMyco = newdf$SoilMyco,
                                  Light_Cat = newdf$Light_Cat,
                                  pred_lignin = pred_lignin)

df_lignin$PredictionType <- "Real"
hyp_preds_lignin$PredictionType <- "Hypothetical"

fullpreds_lignin <- full_join(df_lignin,hyp_preds_lignin)
## Joining with `by = join_by(Species, Light_Cat, Conspecific, SoilMyco,
## PredictionType)`
ggplot(fullpreds_lignin,aes(x=Light_Cat,y=pred, shape  = Conspecific)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Lignin, color = Conspecific), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_lignin,aes(x=Light_Cat,y=pred, shape  = SoilMyco)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Lignin, color = SoilMyco), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_lignin,aes(x=SoilMyco,y=pred, shape  = Light_Cat)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Lignin, color = Light_Cat), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_lignin,aes(x=Conspecific,y=pred, shape  = Light_Cat)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = Lignin, color = Light_Cat), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

NSC modeling

Output

Preview of Original Data

## # A tibble: 2,782 × 2
##      NSC  pred
##    <dbl> <dbl>
##  1  12.2  12.1
##  2  19.3  20.1
##  3  15.0  14.7
##  4  12.4  12.1
##  5  11.2  10.9
##  6  13.8  12.4
##  7  22.5  20.1
##  8  14.8  14.8
##  9  14.6  14.7
## 10  12.2  12.2
## # ℹ 2,772 more rows
## Joining with `by = join_by(Species, Light_Cat, Conspecific, SoilMyco,
## PredictionType)`
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

R code

Preview of Original Data

df_NSC <- tree_dat_clean %>% 
  add_predictions(mod7) 
df_NSC %>% dplyr::select("NSC","pred")
## # A tibble: 2,782 × 2
##      NSC  pred
##    <dbl> <dbl>
##  1  12.2  12.1
##  2  19.3  20.1
##  3  15.0  14.7
##  4  12.4  12.1
##  5  11.2  10.9
##  6  13.8  12.4
##  7  22.5  20.1
##  8  14.8  14.8
##  9  14.6  14.7
## 10  12.2  12.2
## # ℹ 2,772 more rows
pred_NSC = predict(mod7, newdata = newdf)

hyp_preds_NSC <- data.frame(Conspecific = newdf$Conspecific, 
                                  Species = newdf$Species,
                                  SoilMyco = newdf$SoilMyco,
                                  Light_Cat = newdf$Light_Cat,
                                  pred_NSC = pred_NSC)

df_NSC$PredictionType <- "Real"
hyp_preds_NSC$PredictionType <- "Hypothetical"

fullpreds_NSC <- full_join(df_NSC,hyp_preds_NSC)
## Joining with `by = join_by(Species, Light_Cat, Conspecific, SoilMyco,
## PredictionType)`
ggplot(fullpreds_NSC,aes(x=Light_Cat,y=pred, shape  = Conspecific)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = NSC, color = Conspecific), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_NSC,aes(x=Light_Cat,y=pred, shape  = SoilMyco)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = NSC, color = SoilMyco), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_NSC,aes(x=SoilMyco,y=pred, shape  = Light_Cat)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = NSC, color = Light_Cat), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(fullpreds_NSC,aes(x=Conspecific,y=pred, shape  = Light_Cat)) +
  geom_point(size = 2) +
  geom_jitter(aes(y = NSC, color = Light_Cat), alpha = .5) +
  theme_minimal() +
  facet_wrap(~Species, scales = 'free')
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

Conclusions